AITopics | gpt-2 model

Collaborating Authors

gpt-2 model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

45e15bae91a6f213d45e203b8a29be48-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 14:23:13 GMT

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

c9f06bc7b46d0247a91c8fc665c13d0e-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 11:14:15 GMT

activation, language model, neural network, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah > Utah County > Provo (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Industry:

Government (0.67)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)

Add feedback

Dynamic Base model Shift for Delta Compression

Huang, Chenyu, Ye, Peng, Zheng, Shenghe, Wang, Xiaohui, Bai, Lei, Chen, Tao, Ouyang, Wanli

arXiv.org Artificial IntelligenceMay-19-2025

Transformer-based models with the pretrain-finetune paradigm bring about significant progress, along with the heavy storage and deployment costs of finetuned models on multiple tasks. Delta compression attempts to lower the costs by reducing the redundancy of delta parameters (i.e., the difference between the finetuned and pre-trained model weights) through pruning or quantization. However, existing methods by default employ the pretrained model as the base model and compress the delta parameters for every task, which may causes significant performance degradation, especially when the compression rate is extremely high. To tackle this issue, we investigate the impact of different base models on the performance of delta compression and find that the pre-trained base model can hardly be optimal. To this end, we propose Dynamic Base Model Shift (DBMS), which dynamically adapts the base model to the target task before performing delta compression. Specifically, we adjust two parameters, which respectively determine the magnitude of the base model shift and the overall scale of delta compression, to boost the compression performance on each task. Through low-cost learning of these two parameters, our DBMS can maintain most of the finetuned model's performance even under an extremely high compression ratio setting, significantly surpassing existing methods. Moreover, our DBMS is orthogonal and can be integrated with a variety of other methods, and it has been evaluated across different types of models including language, vision transformer, and multi-modal models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.11344

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

AI-generated Text Detection with a GLTR-based Approach

Wu, Lucía Yan, Segura-Bedmar, Isabel

arXiv.org Artificial IntelligenceFeb-17-2025

The rise of LLMs (Large Language Models) has contributed to the improved performance and development of cutting-edge NLP applications. However, these can also pose risks when used maliciously, such as spreading fake news, harmful content, impersonating individuals, or facilitating school plagiarism, among others. This is because LLMs can generate high-quality texts, which are challenging to differentiate from those written by humans. GLTR, which stands for Giant Language Model Test Room and was developed jointly by the MIT-IBM Watson AI Lab and HarvardNLP, is a visual tool designed to help detect machine-generated texts based on GPT-2, that highlights the words in text depending on the probability that they were machine-generated. One limitation of GLTR is that the results it returns can sometimes be ambiguous and lead to confusion. This study aims to explore various ways to improve GLTR's effectiveness for detecting AI-generated texts within the context of the IberLef-AuTexTification 2023 shared task, in both English and Spanish languages. Experiment results show that our GLTR-based GPT-2 model overcomes the state-of-the-art models on the English dataset with a macro F1-score of 80.19%, except for the first ranking model (80.91%). However, for the Spanish dataset, we obtained a macro F1-score of 66.20%, which differs by 4.57% compared to the top-performing model.

f1-score, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.12064

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > United States (0.04)
Europe > Monaco (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models

Neural Information Processing SystemsJan-18-2025, 12:27:22 GMT

Recent works have demonstrated great success in pre-training large-scale autoregressive language models (e.g., GPT-3) on massive GPUs. To reduce the wall-clock training time, a common practice is to increase the batch size and learning rate. However, such practice is often brittle and leads to a so-called stability-efficiency dilemma: increasing the batch sizes and learning rates leads to better training efficiency but can also result in training instability, leading to poor generalization accuracy or failed runs. To better understand this phenomenon, we conduct an in-depth analysis on large-scale pre-training experiments replicating the GPT-2 model with public dataset. We find that there is a strong correlation between training instability and extreme values of gradient variance.

sequence length warmup, stability-efficiency dilemma, training instability, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation

Mansurov, Jonibek, Sakip, Akhmed, Aji, Alham Fikri

arXiv.org Artificial IntelligenceDec-15-2024

In this paper, we show that knowledge distillation can be subverted to manipulate language model benchmark scores, revealing a critical vulnerability in current evaluation practices. We introduce "Data Laundering," a three-phase process analogous to financial money laundering, that enables the covert transfer of benchmark-specific knowledge through seemingly legitimate intermediate training steps. Through extensive experiments with a 2-layer BERT student model, we show how this approach can achieve substantial improvements in benchmark accuracy (up to 75\% on GPQA) without developing genuine reasoning capabilities. Notably, this method can be exploited intentionally or even unintentionally, as researchers may inadvertently adopt this method that inflates scores using knowledge distillation without realizing the implications. While our findings demonstrate the effectiveness of this technique, we present them as a cautionary tale highlighting the urgent need for more robust evaluation methods in AI. This work aims to contribute to the ongoing discussion about evaluation integrity in AI development and the need for benchmarks that more accurately reflect true model capabilities. The code is available at \url{https://github.com/mbzuai-nlp/data_laundering}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.15255

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AERO: Softmax-Only LLMs for Efficient Private Inference

Jha, Nandan Kumar, Reagen, Brandon

arXiv.org Artificial IntelligenceDec-2-2024

The pervasiveness of proprietary language models has raised privacy concerns for users' sensitive data, emphasizing the need for private inference (PI), where inference is performed directly on encrypted inputs. However, current PI methods face prohibitively higher communication and latency overheads, primarily due to nonlinear operations. In this paper, we present a comprehensive analysis to understand the role of nonlinearities in transformer-based decoder-only language models. We introduce AERO, a four-step architectural optimization framework that refines the existing LLM architecture for efficient PI by systematically removing nonlinearities such as LayerNorm and GELU and reducing FLOPs counts. For the first time, we propose a Softmax-only architecture with significantly fewer FLOPs tailored for efficient PI. Furthermore, we devise a novel entropy regularization technique to improve the performance of Softmax-only models. AERO achieves up to 4.23$\times$ communication and 1.94$\times$ latency reduction. We validate the effectiveness of AERO by benchmarking it against the state-of-the-art.

aero, international conference, softmax-only llm, (15 more...)

arXiv.org Artificial Intelligence

2410.1306

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Human-Like Processing: Large Language Models Perform Equivalently on Forward and Backward Scientific Text

Luo, Xiaoliang, Ramscar, Michael, Love, Bradley C.

arXiv.org Artificial IntelligenceNov-17-2024

The impressive performance of large language models (LLMs) has led to their consideration as models of human language processing. Instead, we suggest that the success of LLMs arises from the flexibility of the transformer learning architecture. To evaluate this conjecture, we trained LLMs on scientific texts that were either in a forward or backward format. Despite backward text being inconsistent with the structure of human languages, we found that LLMs performed equally well in either format on a neuroscience benchmark, eclipsing human expert performance for both forward and backward orders. Our results are consistent with the success of transformers across diverse domains, such as weather prediction and protein design. This widespread success is attributable to LLM's ability to extract predictive patterns from any sufficiently structured input. Given their generality, we suggest caution in interpreting LLM's success in linguistic tasks as evidence for human-like mechanisms.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.11061

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report > New Finding (0.89)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Loop Neural Networks for Parameter Sharing

Ng, Kei-Sing, Wang, Qingchen

arXiv.org Artificial IntelligenceNov-8-2024

The success of large-scale language models like GPT can be attributed to their ability to efficiently predict the next token in a sequence. However, these models rely on constant computational effort regardless of the complexity of the token they are predicting, lacking the capacity for iterative refinement. In this paper, we introduce a novel Loop Neural Network, which achieves better performance by utilizing longer computational time without increasing the model size. Our approach revisits the input multiple times, refining the prediction by iteratively looping over a subset of the model with residual connections. We demonstrate the effectiveness of this method through experiments comparing versions of GPT-2 with our loop models, showing improved performance in language modeling tasks while maintaining similar parameter counts. Importantly, these improvements are achieved without the need for extra training data.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.14199

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

McCully, Gary A., Hastings, John D., Xu, Shengjie, Fortier, Adam

arXiv.org Artificial IntelligenceSep-25-2024

Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the application of unidirectional transformer-based embeddings, specifically GPT-2. Using a dataset of LLVM functions, we trained a GPT-2 model to generate embeddings, which were subsequently used to build LSTM neural networks to differentiate between vulnerable and non-vulnerable code. Our study reveals that embeddings from the GPT-2 model significantly outperform those from bidirectional models of BERT and RoBERTa, achieving an accuracy of 92.5% and an F1-score of 89.7%. LSTM neural networks were developed with both frozen and unfrozen embedding model layers. The model with the highest performance was achieved when the embedding layers were unfrozen. Further, the research finds that, in exploring the impact of different optimizers within this domain, the SGD optimizer demonstrates superior performance over Adam. Overall, these findings reveal important insights into the potential of unidirectional transformer-based approaches in enhancing cybersecurity defenses.

dataset, neural network, vulnerability, (14 more...)

arXiv.org Artificial Intelligence

2409.17513

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Arizona > Pima County > Tucson (0.14)
Asia > Nepal (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback